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1 .  Introduction 

« 

R.  R.  BuBh  and  C.  P.  Moeteller*,  and  alao  W.  K.  Rates*, 
have  proposed  a  stochastic  theory  of  learning.  They  suppose 
that  the  organism  makes  a  sequence  of  responses  among  a  fixed 
finite  set  of  alternatives,  and  that  there  is  a  probability 
P«(t)  at  moment  t  that  response  s  will  occur  before  moment  t+1. 
They  suppose  further  that  the  probabilities  p_(t-fl)  are  deter— 
mined  by;  the  p_(t),  the  response  s^  actviaiiy  made  after  moment 
t,  and  the  event  r^  that  f jliows  after  response  s^.  Specifically, 
they  assume  the  functional  form; 

1.1)  p(t+l)  -  M  ^  p(t), 


where  p(t)  Is  the  nwlimensional  vector  whose  s^^  component  is 
P«(t)E  and  Is  a  square  stochastic  matrix  of  order  ra  whose 
elements  depend  only  upon  r  and  s. 

J 

One  especially  Interesting  case  la  that  In  which  there  are 
just  two  classes  of  events;  for  example,  they  might  be  reward 
and  non— reward.  For  the  purposes  of  this  paper,  it  will  be 
sufficient  to  consider  only  this  case  since  the  more  general 
case  presents  no  added  difficulty.  Our  object  Is  to  narrow  the 
class  of  allowable  matrices  M  by  making  one  more  assumption 
that  seems  quite  reasonable. 


♦Numbers  refer  to  Bibliography  at  the  end  of  the  paper. 
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A  n'.imber  of  suggeations  have  already  been  made  for 
epeclallzing  the  form  of  the  matrlcea  several  of  these  are 

discussed  in  a  recent  paper  by  R.  R.  Bush  and  0.  L.  Thompson,* 
both  as  to  their  matheinatical  form  and  as  to  their  Implications 
from  the  standpoint  of  learning  theory,  Iw  is  entirely  possible 
that  the  proposal  I  am  noe  making  has  already  been  considered 
but,  if  so.  It  has  not  come  to  my  attention, 

2.  Background 

I  like  to  think  of  the  elements  o^'  the  matrices  M  as 

physical  constants  that  are  characteristic  of  the  organism,  Just 

as  mass,  color,  and  hardness  may  be  thought  of  as  physical  eoi>- 

rs 

stants  pertaining  to  an  object.  The  values  of  the  M  are  to 

1 J 

be  estimated  on  the  basis  of  data  fro.m  an  appropriate  experiment. 
Just  as  the  mass  of  an  object  might  be  estimated  from  a  set  of 
observations  taken  during  an  experiment  with  a  spring— balance. 
Furthermore,  my  theoi'y  Is  op rationally  r-ell-def ined  only  after 
I  specify  some  single  analytical  process  for  estimating  the 
constants,  such  as  by  averaging  the  observations. 

This  must  be  done  by  observing  the  organism  in  some  situation 
where  a  correspondence  Is  set  up  b«»tween  a  sequence  of  observed 

responses  s^  and  events  and  the  formal  quantities  s^  and  r^ 

-  r  B 

in  the  theory;  the  estlmatss  M  are  necessarily  functions  of 

i  tj 

the  observations  s^  and  f^,  though  perhaps  a  different  one  for 
each  parimeter.  It  does  not  matt;^r  that  I  am  as  y^t  unable  to 
write  these  functions  simply;  all  that  is  necessary  la  t/i&t  there 
be  a  finite  computational  process  that  will  yield  the  desired 


eitlmates  M**®  In  t^rma  of  th«  obaarvatlona  a*,  and  r.  for 
IJ  t  c 

•  •  •  ,N.  To  accorapllah  this,  wa  flrat  l«t  p(t-»'l)  be  defined 
by  the  recursion  ri^latlon 

A 

?.l)  pCt+i)  »  M  ^p(t)  for  t-i, N, 

^  rs  /V 

where  ,  and  pi,(l)  are  defined  to  be  the  values  of  the  parameters 

IJ  K 

and  Pj^(l)  that  maximize  the  "likelihood" 

2.2)  !.[«'■*  p^(l)]  ■  KVs  (-)• 

t*!  t 

There  is  nothing  new  in  what  I  have  uaid  so  far,  of  course;  these 
expository  renvarks  simply  provide  the  background  for  what  follows, 

3.  Symmetry  and  Scope 

If  the  theory  is  to  be  of  much  interest,  and  wide  uee,  it 
must  provide  a  valid  description  of  a  broad  class  of  orfanlsmic 
behavior;  the  scope  of  the  theory  must  be  specified  in  terms  of 
bounds  for  the  class  of  behavioral  situations  explained  by  the 
theory.  In  particular,  it  should  be  possible  to  verify  the  theory' 
by  testing  it  for  only  lome  situations  within  a  well-defined 
subclass  before  using  It  confidently  for  predictions  concerning 
the  remaining  situations  of  the  subclass.  I  believe  this  is  what 
Is  meant  by  the  term  theory  in  scientific  usage.* 

As  a  special  case,  we  might  consider  the  subc  ass  of  all 


•  "in  scientific  usage,  a  HYPOTHESIS  is  a  provisional  con¬ 
jecture  regarding  the  causes  or  relations  of  certain  phenomena; 
a  THSCRl'^  is  a  hypothesis  which  has  undergone  verification,  and 
which  Is  applicable  to  a  large  number  of  related  phenomena," 
MebMtef'B  New  International  Dictionary,  Second  Edition,  Unabridged, 
0.  and  C.  Merrlam  Company,  Springfield,  Mass, ,  1951*  P*  2620. 
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tNo->cholce  situations  for  all  anlnuils.  Still  mors  sptcially, 
w«  taight  consider  the  subclass  of  ail  two-choice  situations  for 
some  one  hunai  consisting  at  any  one  time  of  either:  a)  doing 
some  particular  thing  (such  as  blinking),  cr  b)  not  doing  that 
thing.  There  is  a  serious  problem  in  identifying  this  choice 
class  at  any  one  time  with  what  appears  to  be  the  same  situation 
at  another  time,  but  the  important  thing  is  to  be  able  to  do  this 
so  well  that  the  theory  does  in  x'aot  check  out  oloseiy  and  often; 
deviations  between  theoretical  and  obser/ed  behavior  must  then 
be  sorted  out  after  the  fashion  of  the  statistician  and  if  after 


probing  they  seem  to  be  unexplainable  and  satisfactorily  small 
then  the  theory  is  considered  to  be  valid  for  such  purposes. 

If  we  were  lucky  wl th  the  stochastic  learning  theory  we 
might  find  a  large  class  of  human  choice  experiments  explained 


by  it  Ir  this  sense.  For  Instance,  if  the  estimates  of  the 
rs 

parameters  M.  were  found  to  agree  well  in  repeated  trials  with 

^  V 

the  same  person  in  some  one  experimental  learning  situation  then 

"^rs 

we  would  accept  this  as  evidence  that  the  were  phyalcal 
constants  anc  characteristic  of  the  situation.  If  these  same 


values  ware  found  for  many  types  of  humans,  but  another  set  of 
constants  was  found  repeatedly  for  rata  in  the  same  experimental 
situation,  then  we  would  accept  ;hl8  happily  as  evidence  that 
our  theory  had  still  wider  scope.  Thus,  scientific  development 
consists  in  increasing  the  extent  of  the  subclass  of  situations 
that  can  be  explained  reliably  by  each  hypothesis  and  In  aharpenlng 
the  boundary  between  this  subclass  and  others  In  which  the 
hypothesis  falls. 
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And  BO  with  the  otochastic  leaniing  t.'>  v  A  good  Tlrst 

step  would  be  to  find  any  experiment  with  humans,  Involving 

choloes  and  rewards,  that  can  be  repeated  over  and  over  again 

-NTfl 

yet  always  yielding  essentially  constant  Then  a  good  second 

step  would  be  confirmation  of  this  constancy  In  quite  a  different 
experimental  situation.  As  an  e::ample,  suppose  that  the  M 
could  be  estimated  reliably  for  a  rat  and  a  man  with  the  same 

Just  as  they  can  both  be  welghiM  on  the 
same  scales,  and  that  these  estimates  and  used 

successfully  to  predict  the  amounts  that  the  man  and  the  rat 
would  each  win  playing  cooperatively  In  some  carefully  selected 
non-zero  sum  game;  then  this  result  would  Increase  our  confidence 
In  the  validity  of  the  learning  theory  In  such  situations. 

Turn  now  from  scope  to  symmetry,  and  start  with  the  notion 
that  there  must  have  been  a  first  occurrence  for  each  choice- 
situation  met  by  the  organism.  On  the  first  occurrence,  esaeri— 
tlally  by  definition,  there  would  be  no  way  for  the  organism  to 
have  a  bias  In  favor  of  any  single  choice.  Furthensore,  If  the 
theory  Is  to  be  of  use,  the  numberlr.g  of  the  choices  Is  arbitrary 
and  the  validity  of  the  theory  cannot  be  dependent  upon  the 

rs 

numbering  actually  selected.  In  other  words,  the  mstrlcas  M 
must  be  such  that  any  two  vectors  q(t)  and  q(t+l)  obtained  by 
applying  the  same  permutation  to  the  components  of  pjt)  and  y(t+l) 
must  satisfy  the  relation 

q(t+l)  -  «”qCt) 

Whenever 

p(t-»-l)  -  M^®p(t), 
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®nd  provided.  thAt  th®  porautAtion  Iaavob  th6  oomponont 
unchanged.  In  the  next  section »  where  this  assumption  Is  stated 
■ore  precisely,  it  is  shown  that  symsetry  restricts  the  matrices 
M  very  considerably;  the  number  of  Independent  parameters  Is 
reduced  to  three  for  each  event  class  whenever  theire  are  more 
than  two  choices,  and  to  two  for  each  event  class  otherwise. 


4.  Syaietrical  Model 


We  start  with  stochastic  matrices  for  r-0,1  and 

s-1,2,'*-,b,  as  in  Section  1.  In  this  section.  It  will  sewietimes 
be  convenient  to  omit  the  superscript  r  when  the  argument  is 
Independent  of  this  dietlnotloa.  Furthermore,  it  will  be  enough 
to  mak.  the  argisaent  for  some  one  value  of  s,  say  s-1.  since  an 
exactly  similar  argument  holds  for  other  values  of  s;  so  we  also 
omit  the  superscript  s,  with  the  undtrstanilng  tl.st  we  are  dls- 
caasir^  only  N**'  explicitly  In  this  section,  and  consider  the 
eletMrnt^  of  this  typical  stochastic  matrix 

C’ir  symmetry  assumption  Is  now  equivalent  to  the  condition 


4.1)  iWpCt)  -  prrp(t), 

where  T  1’  any  permutation  matrix  such  that  Tm  -  1  and  where 
p(t)  la  any  prx>bablllty  vector.  Since  p(t)  can  be  any  unit 
vector,  an  equivalent  condition  is  simply  that  roust  commute 
with  every  T  ^  we  may  set 


4.2) 


TH  -  MT  •  i. 
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A.n  oqulvaltnt  requirement  le,  therefore,  that 

aiiu  Hyj  -  Hy^  for 

x,y-i,2, ■ • • »m. 


It  followa  eaelly  from  ^.3//  when  ra  >  2,  that  N  must  be  of  the 
form 

l-.(m-l)a*  b^  ...  b’^  b** 

r 


a 


4.4)  B*’*  * 


a 

a' 


m— 2 


where  a,  b,  and  c  are  three  parameters  subject  only  to  the 
ree^rlotlons : 


^.5) 


0  <  a  <  l/'iTv-l) 
0  b  <  1, 

0  <  c  ^ 

0  <  brc  <  1. 


When  ra«2,  there  are  only  two  parameters,  and  M  le  of  the  form 

i;  it 

ii  ji 

*^.6)  M  -  I  , 

"  a  1-b  I! 

•» 

where  a  and  b  are  arbltrairy  within  the  closed  'unit  Interval. 
Of  co^irse,  the  general  symmetrlcel  model  K  Is  obtalneu  by 
permuting  the  1®^  and  s^^  rows  and  the  1*^  and  s^^  columns  of 
M(s^,b^,c^)  as  defined  by  the  relations  ^.4)  anl  4.6). 


5.  Some  CorhparlBona 

Bush  and  Moateller**  have  proposed  a  specialization  of  tne 
general  operator  of  Section  1,  calieo  by  them  tne  "combining 
classes  model,  wtilch  is  determined  by  the  requirement  that 
condition  l»l)  reduce  to  the  condition 


5.1)  Pilt+1)  -  aj'pj(t)  +  b". 


Of  course,  the  parameters  appearing  In  5.1)  are  also  Bubj^ct  to 
the  TeistriGtlonB 


c  N 

U*  t  / 


a 


rs 


m  ra 

1  ~  I  0  , 

t  •  l  u 


rs 


m 


rs 

J  -  i  c  - 


rs  re 


a  ,  I  ,  <  1  b  min  ;  b 1  ,1^  , *  *  *  , 


rs 


0  <  bf®  <  1. 


In  matrix  form,  as  hr  been  shown  by  Bush  and  Thompson,*  5.1)  and 
5.2)  are  equivalent  to 


rs 

rs 


5.") 


■rs 


/I 

a  e(  1- 


rs 


r® 

>1 


where 

the 

rs 

u,  are 

compor.enps 

of  an  arMtrary  ;ro:a;  lili 

vector 

and 

a"'®  is 

8  il  ject 

onl 

y  to  the  rest r'et Ion 

1  ^  a^® 

m^n  • 

Vj  A 

ra 

c 

min 

T® 

*« 

• 

4 

—  1 

I I  IB  ObVlOJS 


on  co'Tiparlaon 


\ 

i 


of  .  “  '  '  •.3),  '.hat  -hn 

8y;"jnelrlcai  nohei.  is  of  cor^blnlng  Ciasses  fcrr;  :f  and  only  i 
r  i. 

a  -  — 1 — in  •*.4),  On  the  other  hand,  the  comhining  c  i.npEee 

rTW-  c 

y«  a  7*  S 

form  0.3)  haa  3ynrr<etry  1!'  and  or:..y  if  a  and  -  are  inde.enden 

0 

oi'  3 

r. 

^  if  i  e  6. 

TV-1 

T;ii3,  neither  trie  sy.rLT.etri  ca.  'tciel  r.or  tne  rombir''rLg  c^aases 
n  ae^  can  I'e  oitaine..!  from  tne  c-ner  fy  p :  ec  i  a  .  i  z  i  rig  parameter 
.alies.  Ana,  ^A'  coarse,  the  "symmetrica,  ^um:  irking  Ciftsses" 

r  i_r-h_.,.,r 

TiGcie  ,  ,  defined  :  “  .  ■* )  a:;  i  me  coro-itlon  'hr.t  a*  -  — _h  ..  ^ 

Batisfies  let'.  t;.e  comr  ining  c..as3e8  and  aymnetry  ass  t  i  jns . 

Sti:  i  mere  specialized  models  considered  l.y  Busn  and 
rnompson*,  in  wruen  1  shiali.  be  Interested,  lncl.ide  th.e  t**.' 
followirig,  <i»neri  tKritlen  as  apeoinl  cases  of 


Pure  Mc.’.e..;  a' 


Mixed  M-: 


c  •  1 ;  a  '  -O  ,  '  '  ♦  c  ^  ■  1 , 

*l-(!TW-y  ;  h' ’  ,  a'^0,  c'- 


*-i  -/ 


The  Krror  pro:  iem 


Th  0  re  i  s 

ar-  i  . ;  ortani  ae 

r.se  in 

^  Va  ^  -  r- . 

on« 

of  c-  .r  modei.* 

yeX  a  u 1 1  a  r  ,  e 

for  ex/erimenta. 

V  e  r '  f  i 

cat  ion; 

we 

have  made  to 

3i.owar.ee  f  ■:  r 

01  s  e  rv  a  1 1 0  n  a  .  e 

r*  *  »'• 

X  «  V-  X  •  / 

,  the 

de .  8 

Altiio.gh  we 

.nave  .sad  tne 

.iKe.ihooi  metr. 

od  \  1  n 

'■  m  ■'  *'  *  on' 

V  V  V  *.  i  •; 

‘0  ;rc.’i.;e  an 

operational  Jefini  t  cf  o..r  parameters,  vn.i  r'-en  too  ;,yn  tre 
;rcce83  eitn  whic.'.  ee  ire  dealing:  is  sti.cr .ast  1  c  .  sti^  *  m.^a' 
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Include  some  ordinary  error  parameters  in  our  theoretical  model 
before  we  can  complete  the  estimation  calculations  in  meaningful 
terms,  I  shall  now  introduce  a  new  learning  model,  including  an 
error  parameter,  to  clarify  this  point. 

As  in  Section  1,  the  component  pj(t)  of  an  nv-diraenslonal 
probalMlty  vector  p*(t)  represents  the  probability  of  response 
1  after  moment  t,  Now  we  introduce  an  auxlliax*y  m-dlmenslonal 
probability  vector  q(t),  with  components  q^(t),  that  satisfies 
the  stochastic  relation 

6.1)  q(t+l)  -  M  ‘ 

1*8 

where  M  i  the  same  definitions  as  in  Section  1. 

Finally,  we  define  p*|t)  by  the  relations 


6.2) 


p*(t)  -  vt)  If  -  0, 

p'lt)  -  e 


•t-.  ‘ 


.th 


where  e  Is  the  probability  vector  whose  x  component  Is  unity; 
in  wonis,  this  means  that  the  organism  never  changes  Its  choice 
after  a  rewarded  response.  It  Is  Immediately  apparent,  that  the 
likelihood  of  2.C)  will  be  zero  with  this  model  for  any  set  of 
observational  data  in  which  there  is  even  a  single  exception  to 
the  rule  "don't  change  on  a  winner;"  and  this  will  be  the  case 
even  If  the  apparent  exception  is  due  tv,  a  clerical  error. 

We  will  modify  the  model  by  adding  a  new  parameter  0 
representing  the  probability  that  at  any  moment  t,  after  a 
rewarded  choice  s^^,  that  a  choice  s^  ®t— i  made  at 

random  among  the  (ro— l)  aitema.tlvea  rather  than  accord.ing  to  the 


probability  components  of  p*(t).  This  leads  to  a  probability 
vector  p(t)  defined  by  the  relation 


60) 


(t)  -  q(t)  if  -  0, 


p(t)  -  (l-e)e  +  ^  ’^t-i  " 

®t-i  ®t-i 


where  J  Is  the  tn-dimensl onal  vector  whose  components  are  all 
unity.  Ttie  new  likelihood  function  is  then 

6.4)  L[Mr®  p^d).  e]  .  k,(i)TT.p,  Ct)][(i-e)'t®  ’ 

where  A  Is  the  number  of  times  s^#  -  *t*— t'  ® 

the  number  of  times  s^*  /  ®t*— i*  easily  seen  that  the 

value  Ox"  6  that  maximizes  this  likelihood  Is 

0.5)  e  -  ~  . 

A+B 

With  the  statistical  parameter  0  included,  a  clerical  error 
after  a  rewarded  choice  need  no  longer  make  the  likelihood 
zero. 

Perhaps  a  better  way  to  Include  the  error  parameters  Is 
to  distinguish  between  those  occurring  after  a  rewarded  choice 
and  those  after  a  non—rewarded  choice.  This  can  be  rationalised 
by  noting  that  there  are  really  two  rather  different  types  of 
errors,  one  class  due  to  clerical  and  other  observational  flaws 
and  another  class  due  to  the  inexactness  of  the  theory,  f  r 
example,  I  suspect  that  chariges  after  a  rewarded  choice  that 
are  made  while  the  envlronmeht  Is  really  statlonai^y ,  and  with 
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no  serial  correlations,  reflect  the  tendency  for  organiama  to 
continually  search  for  evidence  of  non— statlonarlty ,  and  for 
serial  correlation;  none  of  our  models  is  general  enough  to 
match  such  tenaencles. 

For  a  moment  then,  we  will  consider  a  model  in  which  two 
statistical  parameters  appear.  We  start  with  any  model,  like 
those  specified  by  l.l)  or  6.2),  in  which  there  Is  a  vector 
p(t)  whose  component  p_(t)  represents  the  approximate  probability 
that  the  organism  will  make  choice  s  after  moment  t.  We  next 
suppose  that  the  probability  is  0*  after  a  rewarded  choice,  and 
6^  after  a  non— rewarded  choice,  that  the  organism  will  not  choose 
according  to  the  vector  p(t);  alternatively,  the  parameters  6*^ 
and  0^  can  each  be  thought  of  as  Joint  probabilities  of  theore¬ 
tical  and  observational  error.  The  probability  vector  p(t)  that 
we  should  use  to  represent  total  behavior  is,  therefor#, 

6.6)  pjt)  -  [  il'-e*)p(t)+|'j]r^_j+[(l-0^)p(t)+|  J]  (l-r^^). 

The  likelihood  eatlmates  of  0°  and  0^,  for  this  model,  cannot 
be  written  explicitly  for  the  general  operator  M  that  yields 
p(t+l)  from  p(t);  this  was  possible  for  the  model  of  6.5)  only 
because  the  factors  of  the  likelihood  function  Involving  6  were 
not  dependent  upon  M, 

7.  A  Preferred  Model 

I  prefer  the  stochastic  learning  model  deflr.ed  by  0.3) 
and  6.6),  where  q(t)  satisfies  a  stochastic  relation  of  the 
fonn  1.1)  with  of  the  form  4.4)  or  4.6).  This  model  Is  not 


of  the  form  l.l),  where  M 


is  required  to  be  a  stochastic 


matrix,  but  It  Is  of  the  form  1.'’)  If  M  ^  ^  Is  Interpreted 
as  a  more  general  operator.  Thlt  referred  Kodal"  Includes 
error  parameters  provides  directly  for  the  "don't  change  on 
a  winner"  principle,  and  restricts  the  class  of  operator  matrices 
as  required  by  the  symmetry  assumption.  The  Preferred  Model, 
for  m  >  2  and  starting  at  moment  t^,  has  the  following  m+6 
independent  parameters:  1-1,2, •••  ,m~l;  a*^,  b*^,  c*^, 

for  J»l,2.  We  think  of  the  parameters  as  relevant  to 
corrections  both  for  clerical  error,  and  for  such  theoretical 
errors  fs  those  due  to  apparent  non-statlonarlty  or  serial 
correlation  In  the  process. 

The  main  hypothesis  Is  that  the  theoretical  scope  for  th® 
Preferred  Model  is  the  class  of  all  la-cholce  repetitive  sltxiatlons 
for  organisms  acting  In  a  stationary  serially  tjtnoorrelated 
environment,  and  one  In  which  each  repetitive  ra-cholce  situation 
has  a  first  occurrence  at  moment  t-1.  The  simplest  experiments 
to  test  this  hypothesis  will  probably  be  those  In  which  the 
parameters  will  all  be  taken  equal  to  (l/ra),  on  the 

assumption  that  the  experimental  choice— situation  represents  a 
first  occurrence,  at  least  at  the  start  of  work  with  each  new 
subject. 

In  a  very  8t^o^^g  sense,  the  main  hypothesla  would  be 
supported  If  all  the  pprame^-ers  were  found  to  be  essentially 
constant  over  a  class  of  situations  In  which  the  value  of  ra 
and  the  frequencies  of  x*eward  and  non— reward  were  varied  widely. 
There  would  also  be  good  support  for  the  hypothesis  If  the 


parameters  a*^,  and  c*^  were  found  to  remain  essentially 
constant  over  a  class  of  situations  In  which  m  was  held  fixed, 
and  the  initial  trial  with  each  subject  was  controlled  to  be  a 
first  occurrence  with  *  (V™)»  frequencies  of 

reward  and  non-reward  were  varied. 

Unfortunately,  It  is  not  at  ail  necessary  that  any  parameter 
explicit  In  the  model  be  observationaliy  rather  constant,  over 
an  appropriate  class  of  experltKer4tal  situations;  It  would  be 
enough  if  only  certain  functions  of  the  parameters  were  obser- 
vatlonally  constant.  In  this  sense,  one  car.  never  reject  the 
model;  one  can  only  note  that  a  particular  set  of  observations, 
as  they  were  interpreted  In  terms  of  the  model,  do  not  lend 
support  to  the  main  hypothesis.  Such  a  result  is  non-constructlve , 
since  progress  requires  success  In  observing  constancies  relative 
to  the  model.  In  all,  then,  the  Preferred  Model  and  the  main 
hypothesis  can  never  be  more  than  guides  for  experimentation;  in 
this  sense,  what  we  really  have  Is  only  a  guide  to  preferred 
experiments. 

8.  Some  Preferred  Experiments 

There  Is  much  to  be  said  for  first  testing  the  main 
hypothesis  for  the  two— choice  situation,  since  In  this  case 
there  are  two  fewer  parameters  than  when  m  >  2.  And  It  will 
certainly  be  easier  to  control  the  starting  vector  p  )t^)  than 
to  estimate  It  fr^ra  the  experimental  data. 

If  success  Is  met  in  the  two— choice  case.  In  the  sense 
that  all  of  the  five  parameters  remain  essentially  constant, 
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then  It  will  onli  be  necessary  to  estimate  the  two  new  parameters 
in  the  three— choice  case  If  it  should  be  true  that  values  are 
Independent  of  m.  If  all  this  goes  well.  It  will  provide  strong 
support  for  the  tht  ^  if  these  same  constant  parameter  values 
are  found  as  m  Is  increased. 

It  is  reasonable  to  hope  that  all  the  parameters.  In  the 
m-cholce  case,  will  be  essentially  constant  as  the  reward  and 
non— reward  frequencies  are  varied.  In  this  favorable  event,  It 
will  be  desirable  to  try  other  variatlens  In  the  conditions 
surrounding  the  m— choice  case,  such  as  amount  of  reward,  in  order 
to  find  the  experimental  bounds  within  which  the  model  seems  to 
be  valid  (its  scope)  In  m— choice  situations. 

Perhaps  the  most  Important  of  all  experimental  design 
considerations  Is  the  reqali*ement  that  the  data  not  only  be 
adequate  to  determine  the  parameters  but  that  the  estimation 
calculations  be  manageable.  The  beet  way  that  I  can  now  see  to 
hand''e  the  estimation  problem  is  to  keep  the  nunber  of  successive 
plays  small  In  any  one  sequence  after  starting  with  a  probability 
vector  asBumed  known.  More  specifically  if  the  total  experiment 
with  one  subject  consists  of  n  sequences  of  N  plays  each,  In 
which  the  starting  vector  in  each  sequence  is  (J/m),  then  N 
3ho  ild  be  kept  as  small  as  possible  and  n  should  be  fairly  large. 
Por  iixample,  if  m-2  and  then  the  parameters  in  all  the  models 
considered  in  this  paper  are  easily  estimated  if  n  is  large 
enough;  the  method  of  estimation  is  illustrated  in  Section  9* 
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Thla  very  brl#»r  outline  can  only  euggeet  the  dlrc'tlon  for 
preferred  experimentation  at  the  start,  since  later  designs  must 
depend  upon  the  results  of  earlier  tests.  Generally  speaking 
the  object  is  to  proceed  from  the  simple  to  the  more  complex  as 
observed  constancies  permit  this  type  of  development. 


9*  Parameter  Estimation 

■■  ■  * . ]  ■■  ■■  ■  ■  ■  ■■ 

The  general  method  of  parameter  estimation  will  be 

Illustrated  in  this  section  by  calculating  some  of  the  formulas 

for  one  model— experiment  combination.  The  method  la  quite  general, 

> 

if  the  experiments  are  carefully  designed  lor  the  purpose,  and 
seems  to  provide  manageable  estimation  formulas  for  most  of  the 
models  discussed  In  this  paper. 

The  model  is  defined  by  the  following  relations: 


9.1) 


.ri 


a. 


r  \ 


.ra 


\l-^r 


1-b^  l-a^\ 


\  b 

^  r 


a. 


9.2) 

q(t+l)  -  (r^.M  ^+(1-: 

9.3) 

P(t)  -  r  [(l-w)eA 

^  °t-i 

9.4) 

p(l)  -  q(l)  -  (J/2). 

os, 


The  experimental  deslg/i  provides  that; 

a.  Each  of  n  trials  rcQ’^ires  five  successive  choices 
between  two  alternatives. 
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b.  The  probability  of  reward  on  the  i^*^  choice  In 
each  trial  is  and  the  are  constant  from  trial  to 
trial. 

c.  The  result  of  reward  or  non— reward  Is  in  every 
instance  independent  of  the  choice  actually  made. 

u.  The  pattern  of  reward  and  non-reward  is  determined 
Independently  for  each  trial  by  meauns  of  a  table  of  random 
numbers. 

In  practice,  I  have  used  a  punchboard  with  two  columns  and  five 
rows.  There  aie  32  possible  patterns  of  reward  and  non-reward 
for  one  trial,  since  each  of  five  rows  is  either  rewarded  or 
non— rewarded ,  and  the  choice  among  these  is  made  according  to  a 
probability  distribution  determined  by  32  probabilities  <py,  of 
course  the  are  determined  uniquely  by  the  <py  Most  Important 
of  all,  every  effort  is  made  to  convince  the  subject  that  the 
trials  are  Independent,  so  that  he  will  not  be  Influenced  in 
his  behavior  in  any  one  trial  by  his  experiences  in  earlier 
trials.  Actually,  any  other  (3x2)  design  that  preserves  indepen¬ 
dence  between  trials,  syrometry  between  columns,  and  Independence 
between  and  wituln  columns  of  one  trial  would  provide  the  data 
necessary  for  eetlrnatlon;  provided  all  possible  reward  patterns 
appeared  sifflclently  often. 

The  tat*''  observational  quantities  that  we  shall  use  are 
frequencies  defined  as  follows: 

rir2...r^  Number  of  times  in  all  n  trials 

9-‘;)  )  "  that  s.  ,  ,  was  chosen  after  prior 

choices  Sj^  and  rtBJlts  r^. 
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We  shall  also  need  the  associated  quantities; 


9.0) 


9.7) 


T.  jr,-  •  “T 

^8 iSj. * . S 


rjr*. . "T. 
R  ^ 

8  iS*.  • . 8^ 


(«) 


Total  number  of  times  in  all  n  trials 
that  the  choices  were  s^  and  results  r^. 


rtr*. • .r 
P 

8 iSg*  * • 8 


(3) 


r ir*. . *  r^ 
P 

8  l8«*  *  *  8^ 


The  principle  of  the  likelihood  estimation  method  1b  to 

calculate  theoretical  probabilities  H  ^(s),  as  finctions  of  some 

"*t 

or  all  of  the  parameters  of  a  model,  corresponding  to  each  of 

several  Independent  observables  P^J'Cs).  Then  the  likelihood 

'  t 

function  »  L^(aj,,b^,w) ,  for  n  trials  and  N  choices  eac’,.  trial, 
is  defined  by  the  relation: 


9.8)  loe  Lj,  •  Z  Z  Z  log  H  '^(b). 

t»l  '’I  » 8-1  t  t 

V  V 


The  required  parameter  estimates  ere  the  val  ies  of  a^,  b^,  and 
w  that  maximize  L^,  where  each  parameter  is  restricted  to  the 
closed  uni t— 1 nterval . 


Unblascvl  estimates  may  also  he 
sacrifice  of  precis' on,  by  aslng  any 
incl  .des  the  term-,  If  the  term 

V- 

ano'anta  to  the  assumirtlon  that  the  H 


obtained,  though  at  8 
su'.prodict  of  that 

is  present.  This 

t  ^ 

_  (s)  are  independent 
t 


d i strll u 
make  ise 
r.umer  ica 
thereby 


tion  functions  of  5.  It  is  sometimes  convenient  to 
of  this  selection  principle,  in  order  to  simplify 
1  calculations,  when  the  fuK  set  01  parameters  can 
\.e  broken  into  subsets  !'or  estimation  purposes. 


e 
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As  an  example,  consider  the  case  N«2;  then 


1  2  Ti 

9*9)  log  La  -  2  Z 

rx“0  84,s 


Ti  !  2  TiDt  TxTm 

P  (8)log  H  (b)+  Z  ^ 


4- 


f 

t 


If  only  the  i^erms  Involving  r^-0  and  b^/bi^I  arc  used,  then  just 

the  two  parameters  a  and  b  will  appear,  and  the  likelihood 
^  o  o 

expression  becomes 


o .  .  o  00  oo 


9.10)  log  L(a^,b^)  -  Z  [Pi(s)log  Hi (s )+Pi»(s)log  Hia(®)] , 


8-1 


O ,  ,  oo  ,  . 

where  it  Is  also  assumed  that  Fi(8)  >  0  and  Puis)  >  0.  We  note 


that : 


9.11)  2Hi(l)  -  e\n  J  - 

9.12)  -  e;M°V‘j  -  U(  . 


It  follows  easli-y,  dropping  the  subscripts  on  and  b^,  that; 


^log  L 

.  ‘'“lb 

oo 

(2»-2)F,,(2) 

9*  15  j 

"a 

a-t-b 

2-a-b 

!-♦'(  1-a— b )  ( l~a-*-b ) 

1_(  1-^-b )  { l-.a+b ) 

0 

0 

00  ,  . 

oo 

^  log  L 

.  >'i(b 

^4(2) 

_ ibP^^gj_ 

?.  n) 

ab 

au-b 

2-^-1 

1  +  ( 1— a— b )  ( 1-a  +b  ) 

l-(  l—a— c  )  ( l--a+b  ) 

If  we 

set 

:'’l0g  L 

-  liQS-L 

-  0, 

•  a 

-^b 

and  so 

Ive  for  a  a 

ind  0  with 

the  aond 

If  ions  Ri(l)  /  Ri 

12), 

Ria(l) 

oo  ,  . 

”  i  a  \ » 

iLn  d  a+b  / 

1 ,  we  ot- 

tain 
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9.15) 


and 


a+b 


2-a-b 


9.16) 


1 + ( 1-— a — b )  ( 1—^  -f  b )  1—  ( 1— a— b )  ( 1 — a  4-b ) 


whence 

9.-17) 


a 


2[R°(1)]»  ^  r!^(1)  -  1 
^(1)  - 1 


9.18J  >.  -  gLRtlij*  -  2R^(1)  -  R°^(i)  1 

2Ri(l)  -  1 


o ,  o  oo ,  ,  oo 

If  either  Ri(l)  -  Ri(2)  or  Riall)  -  Ri»(2)  then  al^  solutions 
of  9  15)  and  9.1^)  aatlafy  the  condition  a  +  b  «  1,  and  we  shall 
consider  thla  caae  acparately.  We  are  left  with  the  reault  that 
the  function  L*,  defined  by  9.10),  does  not  attain  Its  maxlman 
value  within  the  restricted  range  of  the  carametera  a  and  b 
except  possibly  waen; 


a  aunJ 
o 

b 

0 

wl thin 

the 

r; . 

c  —  A 

or 

c . 

b  •  0  , 

( 

d- 

0 

1 

or 

e. 

1 

o 

or 

*• 

•  • 

a  +t  - 

O  0 

1 

•A  « 

It  's  easily  seen,  re-^ier^t  erlnf:  tnat  ?i(8)  >  0,  ':ha1 


La(0,0)  -  12(1,1)  - 

,0  00 

-(Pi^Pis) 

L2(®q»1“®q)  “  ?  >  0» 

O  00  O  00  0  OG  ' 

-(fi+Fi*)  Pi('.)+P.»(2)  Pi(2)+Pia(2)  ,  ^.p 

-  2  (2-e^)  {l  +  [l“*o]  ) 


and  that  LaCa^.O)  >  L2(a^,l-^^)  for  some  va.uc  of  In  the  open 
jnit— Interval .  Conseqiently ,  It  foliowa  that  La  attalna  Its 
rr.ajcin  im  value,  within  the  restricted  range  f ’>r  and  b^,  only  If: 

a.  ai.J  satisfy  9.17)  and  9.1H),  or 


9.19) 


a  ■  0  an  I  b  Is  In  the  open  uni t— Interval ,  or 

00*^ 

b  ”  0,  and  a  Is  In  the  open  uni t— interval ,  o! 

a  *--1  and  c  Is  In  the  open  ml t-lnterval ,  or 
00 

-  1  and  a  Is  In  the  open  unit-interval. 

G  0 


It  is  a  tedious  ''ut  straightforward  ca*c  :latlon  to  detemlne  the 
estimated  values  wl'-.h  the  help  of  the  conditions  Q.l-'). 

In  one  >.llDt  experiment,  the  ooserved  values  were 

O  O  00  .  ^ 

Fi'l)  -  Fi  t)  -  bo,  F,2(1)  -  -1,  -  19, 


and  80 


0/  V  ^ 

R,  (1)  -  1/-1  an  ;  Rial  l)  = 


Usin. 


1  ,  we  ha  VC 


1 
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Consequently ,  we  must  consider  the  possibilities: 

2-^^L(0,b)  -  b‘*°(2-b)®°(2-b»)'‘^  .  H,(b), 

2^‘’®L(s,0)  -  a21(2-«)99(2_2a+«»)‘*l  .  h,(»), 

s'^^LCl.b)  -  {l.b)‘‘5(i_b)121(i.b*)>9  .  H.(b), 
2^^^L(a,1)  -  (l+a)^(l-a)^^^(l+2a-*»)^^  -  H4(a). 

The  following  Inequalities  are  easily  verified: 

H4(a)  <  2^^,  HaCb)  <  2^^,  Ha(l/5)  >  10^^  >  2^^, 


-Si¬ 


lt  is  easily  seen  that  the  value  of  w  that  maximizes  L(h)  Is 

f;  +  Pii 

More  generally,  If  all  the  experimental  data  were  ated  to 
estimate  w,  it  la  easily  seen  that  the  result  la: 

fff* 

where  f  is  the  number  of  repeats  after  a  winner  and  f*  is 
the  number  of  non— repeats  after  winners. 

For  the  pilot  experiment  the  estimate  is 

M  >1  ,  using  N  »  3  only, 

f-tf  33 

Actually,  In  the  pilot  experiment,  the  subject  always  repeated 
after  a  winner  so  the  estimate  is  still  w  ■  0  when  all  the  data 
are  used. 

The  estimation  of  aji  and  bi  requires  that  data  for  N  -  ^ 
be  used,  and  the  calculations  are  a  bit  more  tedious  so  the 
details  are  omitted  here.  The  method  Is  exactly  analogous  to 
that  Just  used  for  obtaining  estimates  of  a^  and  b^. 

10.  Summary 

A  stochastic  learning  model  is  proposed  In  which: 

a.  Explicit  provision  la  made  for  errors  of  observations. 

b.  Separate  allowance  Is  made  for  the  "don't  change  on  a 
winner"  pidnclple. 


c.  'Rio  number  of  independent  paramtere  la  reduced,  from 

that  In  the  general  matrix  opera tore  used  by  Bush  and  Hosteller, 

% 

by  a  symmetry  assumption. 

d.  A  preferred  model  Is  introduced  and  discussed,  but 
It  is  no^liitely  that  the  learning  theory  represented  by  this 
model  will  'have  great  scope. 

e.  The  significance  of  these  matters,  with  respect  to 
measurement  of  the  physical  constants  hypothesized  by  such 
stochastic  learning  models,  is  discussed  in  relation  to  similar 
questions  pertaining  to  a  few  alternative  models. 

f.  An  experiment  is  described,  together  with  a  method  for 
estimating  parameters,  that  should  be  adequate  to  provide  a 
critical  test  of  the  various  alternative  theories  discussed. 
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